The Maximum Entropy principle and the nature of fractals 
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We apply the Principle of Maximum Entropy to the study of a general class of deterministic 
fractal sets. The scaling laws peculiar to these objects are accounted for by means of a constraint 
concerning the average content of information in those patterns. This constraint allows for a new 
statistical characterization of fractal objects and fractal dimension. 
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I. INTRODUCTION 



The frequent emergence of fractal objects [l]|2[] in Na- 
ture is a very relevant issue in contemporary physics. 
Since the publication of the celebrated work of B. B. 
Mandelbrot fractal geometry has ideed found a suc- 
cesful use in describing these patterns in physics j3j and 
natural morphogenesis However, we still do not 

know why fractals are so strikingly frequent and stable 
in Nature. 

In a recent paper Q a tentative argument was pro- 
posed in order to give account of the fractal nature of 
diffusion- limited aggregation (DLA) clusters 0. In that 
paper, the branching structure of DLA clusters was an- 
alyzed within a new framework, in which the stress was 
laid not on the order of the branches (as it was tradi- 
tionally done H) but on their mass. In this approach, 
a new magnitud was introduced, the branch distribution 
n(s,M), defined as the average number of branches of 
a given mass s present in a typical cluster of M parti- 
cles. From numerical simulations it was found that, in 
the limit of large branches, this distribution is a scaling 
function of s, namely n(s, M)/M ~ s~ a . This particular 
functional form (power law), which on the other hand 
turns out to be universal in fractal sets, was accounted 
for by the Maximum Entropy (MaxEnt) principle. 

The Theory of Information |)],|l0| and the Maximum 
Entropy formalism fll| are well-known methods in sta- 
tistical physics and time-honored tools for the study of 
complex systems, especially in those stationary situations 
where one faces a considerable lack of information. These 
techniques have found a wide number of applications, 
among which we can mention the prediction of average 
magnitudes in statistical mechanics (where there is an 
extreme lack of information) jllj , the analysis of signals 
hidden in noise the prediction of ecological station- 



ary states |13| , models of growth and differentiation Q , 
etc. (For a general review of applications of the MaxEnt 
Principle, see for instance Refs. [[l3| and fuj .) 

Within the general approach of the MaxEnt formalism 
a complex system X is represented by a set of n 
nodes, characterized by a certain extensive magnitude x. 
The description of the system is accomplished by means 
of a distribution of structural probabilities which assigns 
a probability p{xi) to each of the nodes i = l,...,n. 
In some cases, the structural probabilities have a truly 
physical meaning; an example is provided in the applica- 
tion of MaxEnt to ecological energy-flow networks [p"3[ , 
where the nodes represent compartments in a biologically 
meaningful partition of the ecosystem and the structural 
probabilities represent fractions of energy exchange be- 
tween different nodes. In other cases, however, the struc- 
tural probabilities have a merely statistical interpreta- 
tion. For instance, p(xi) can represent the probability of 
node i possesing a given amount Xi of the characteristic 
magnitude x; we clearly recognize here the analogy with 
statistical mechanics, where the nodes are the different 
states accessible to the system and the role of x is played 
by the energy in those levels. Whatever would be the 
interpretation of the probabilities p(xi), the system X is 
globally characterized by its total entropy, measured by 
the Shannon formula (expressed in nits) P,|io[ 



H(X) = -^pixjhipixi). 



(1.1) 



On these bases, the MaxEnt principle as stated by Jaynes 
says that P, pd] , |i~4| ] : "The least biased and most likely 
probability assignment p{xi) is that which maximizes the 
total entropy H(X) subject to the constraints imposed 
on the system" . The interpretation of this procedure is 
quite clear |lq ]. The magnitude ( fL~l] ) is a measure of 
the amount of uncertainty pi in our description of the 
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system X through the distribution p(xi). Therefore, the 
distribution that describes X in a least biased way, given 
the information available about the system and without 
assuming anything else, is that one which maximizes the 
entropy H(X), subject to the constraints imposed by 
that very information available. Besides the trivial nor- 
malization condition ^2 i= iP{xi) = 1, these constraints 
imposed on the system have to be understood as the 
global effect of the fundamental laws involved in the pro- 
cess. 

In this view, the branching structure of DLA discussed 
in Ref. || was resolved by imposing a constraint over the 
average generating information (to be defined later on) 
in the set of branches in the ensemble of all DLA clusters 
of a given mass M. This generating information was in- 
terpreted as the information required to fully specify the 
structure of a generic cluster, and is therefore related to 
the algorithmic complexity |H| of the DLA process. 

The purpose of the present paper is to propose a new 
generalized Maximum Entropy argument, in order to give 
reason of a wide class of deterministic fractal structures 
that can be defined by an iterative process. Examples of 
such kind of structures are found in the construction of 
fractal sets by means of Iteration Function Systems [jl7| . 
In section 2 we study a simple model of a one-level itera- 
tive construction. We apply to it the MaxEnt principle, 
using as a constraint a suitable expression for the gener- 
ating information, which yields as expected a power law 
behaviour. An example of this model is analyzed, show- 
ing a thermodynamic analogy which allows a new statis- 
tical interpretation of fractal sets and fractal dimension. 
Section 3 generalizes the previous situation, considering a 
more complex two-level model closely related to the DLA 
process discussed in Ref || ; an example is also provided 
and analyzed. Finally, we comment our conclusions in 
section 4. 



II. ONE-LEVEL ITERATIVE MODEL 

Let us first consider the model of a simple iterative pro- 
cess V, whose iteration through a given number of levels 
(labelled with index k) leads to some final pattern. In the 
k-th level we have a structure composed of nk identical 
elements of order fc; nk is the occupation number of level 
k. Each one of those order k elements is characterized 
exclusively by a certain magnitude which we measure 
in units of a certain atom e (the resolution). If Ik is a 
length, then e would correspond to the so-called lower 
cutoff length. However, in order to allow for the possi- 
bility of considering any other extensive magnitudes to 
characterize the levels, we will mantain instead the term 
atom. 

In order to apply the MaxEnt formalism, we need to 
associate to V some meaningful structural probabilities. 
We choose a distribution that assigns to each level k 
a probability pk proportional to its occupation number, 



Pk = n k/J2k' n k'- Given the final structure, constructed 
through an increasing sequence of nested levels, pk is the 
weighted probability of any element in the sequence be- 
longing to order k. This distribution is trivially normal- 
ized to 1. The total entropy of V is then given by 



H{V) = -^Pk^Pk 



(2.1) 



This function is a measure of the diversity of the iteration 
process throughout its whole history, taking into account 
the population density of each one of its levels. 

We next identify the constraints imposed on the sys- 
tem. Let us consider the very nature of the iterative 
process, which we assume that leads to a final fractal 
pattern. This fractal limit is essentially characterized by 
its self- similarity (!]]. This concept is linked to the fact 
that the structure of the whole object is similar (at least 
in some statistical sense) to that of a small part of it. In 
other words, since the whole can be reproduced by any 
small part of it, then the amount of information (generat- 
ing information) needed to reconstruct the object (mea- 
sured in some suitable way) is small and nearly equal 
to the information contained in any small portion of it. 
The situation is quite the same for the Iteration Function 
Systems ]Tij ]. In that case, all the information required 
for the construction a complex deterministic fractal is 
compressed into a small number of contraction functions 
and, in the simplest case, into a small set of real numbers. 
Hence, these fractals do indeed contain a small amount 
of information. 

In order to express this information content, let us con- 
sider an element of order k, characterized by the magni- 
tude Ik- This element is therefore composed of Nk — £k/z 
atoms of size e, arranged in a certain way. If atoms are 
considered to be indistinguishable, then the amount of 
information needed to specify the arrangement of those 
Nk atoms in the element is the same as that involved in 
the problem of selecting Nk objects with equal probabil- 
ity. Elementary information theory ^| tells us that this 
information is lnAfc. Accordingly, the amount of gen- 
erating information used up for the specification of an 
element of order k is Ik — In Nk = In (^fc/e). The average 
information over the whole iteration process V is then 



(I) = ^Pklk = ^Pk In 



(2.2) 



At this point we shoul d st ress th e fu ndamental difference 
between magnitudes (2.1) and (2.2). Both are informa- 
tions, in the strict sense of the mat hem atical Theory of 
Information. However, the entropy (|2.l | ) r efers to the di- 
versity of an iterative process, while ( |2.2[ ) is a tentative 
measure of what we have to know in order to reproduce 
that very iterative process. 

Our fundamental assumption concerning V is that the 
relation (/)=/ = const, works as a constraint acting on 
the process and that there arc no further relevant con- 
straints playing any essential role. The MaxEnt principle 
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is then applied by maximizing the total entropy (2.1), 
subje ct t o the constraint of average generating informa- 
tion (2.2) constant. The maximization is performed by 



means of the Lagrange multipliers method |l§|] , comput- 
ing the extremes of the auxiliary function 



k 




where (3 and p' are Lagrange multipliers. The extreme 
(more precisely, the maximum), given by the condition 
d^/dpk = 0, yields to the equation 



lnpfc — 1 — f3 In 



/3' = 0, 



from which we obtain the following expression for the 
structural probabilities 



Pk 



,-1-/3' 



This equation, together with the relation pk ~ ilk, pro- 
vides the functional form for the occupation numbers 



///, = const, x ( — 



(2.3) 



In other words, the occupation numbers scale as a power 
of £k, which implies a self-similar behaviour of those mag- 
nitudes with respect to that variable jl). The interpre- 
tation of P is linked to the fractal dimension of the final 
pattern, as will become clear later on. 

Eq. (2.2) can be readily interpreted in a statisti- 
cal framework by imposing the normalization condition 
J2kPk — 1; yielding to the expression 



Pk 



(4/e) 



By introducing the partition function 



(2.4) 



Z{J3) 



-0 



J^exp i 

k 



-0hi(t k /e)) , (2.5) 



the structural probabilities are given by 



Pk 



(4/e) 



Z(J3) 



(2.6) 



and the average information (2.2) is given by 



(2.7) 



That is, the iterative process is analogous to a canoni- 
cal ensemble from statistical mechanics defined by 
a "spectrum" of information levels Ik = lnf^fc/e). The 
partition function of this ensemble is Z{f$), with a "tem- 
perature" 1//3 which can be computed in principle by 
eq. (2.7), if the actual value I of the constraint is known. 

As an example of this kind of structure, let us consider 
the iterative construction of the van Koch curve |fl^4| , 
the first steps of which are depicted in figure 1. The pro- 
cess uses two simple forms: An initial straight segment 
of length 1 (stage k = 0), the initiator, and a generator. 
This latter is a broken line made up of N segments of 
equal length r < 1. Every iteration starts with a broken 
line, and proceeds by replacing every rectilinear segment 
with a reduced copy of the generator, rotated in such a 
way that it has the same extremes as the interval being 
replaced. In the fc-th level of the process, we have a bro- 
ken line made up of Uk = N k equal straight segments of 
length 1^ = r h , which we identify as the elements of that 
level k. Iterating this procedure infinitely many times 
leads to a strictely self-similar fractal curve with fractal 
dimension D = In AT/ln(l/r) Q. On the other hand, if 
the iteration is stopped after m + 1 steps, we obtain a 
final structure composed of elements of minimum length 
£ m = r m , which we take as the atom (in this case, the 
cutoff length) e, so that £k/e — r k /r m . By taking log- 



arithms in both expressions nu and 
the dependence on k we get 



/e and removing 




3 



k=0 




FIG 1. First four iterations in the construction of a van Koch 
curve with N = 4 and r = 1/3, D = In4/ln3 = 1.26186. 

The occupation numbers fit the power-law form predicted 
by the MaxEnt with the imposed informational con- 
strain^ By comparing this expression with the general 

form ( [2.3] ) , we can recognize the meaning of the Lagrange 
multiplier /?: It corresponds to the fractal dimension D 
of the set. 



By substituting £ k /e 



_ r (k — m) 



into eq (2.5), the parti 



tion function for the m-th iteration reads 

m m 

Z m — exp^— (3{k — m) hirj = exp^— /3k ln(l/r) 



k=0 



k=0 



(2.9) 



For the Van Koch curve, the information constraint (2.7) 
is expressed as a condition on the average information in 
a canonical ensemble given by an "spectrum" of informa- 
tion levels 2k = fcln(l/r), an i nverse temperature (3 = D, 
and a partition function (2.E). Its statistical properties 
are therefore formally analoguos to those of the quan- 
tum harmonic oscillator, with a spectrum of energy lev- 
els E n — fiui(n + i) |Q. The information levels T k are 
equally spaced and they represent the value of the infor- 
mation in each of the iterative steps leading to the van 
Koch curve. At each of these steps the information is in- 
creased by a constant quantity AX = Tk+i—Ik = ln(l/r). 
This increment stands for the amount of information 
needed to push the interation from one level to the next. 
The fact that the increment of information is constat al- 
lows a formal computation of the Lagrange multiplier (3 
(the fractal dimension) as a function of the actual infor- 
mation content /. Consider the limit m — > oo in eq. (2.9). 
We obtain the partition function for the final and strictly 
self-similar Van Koch curve, namely 



Z((3) = lim Z m = 



1 



1 _ e -/31n(l/r) ' 



The average information (2.7) is then equal to 

(D AI 



e /3AX _ 1 ' 



(2.10) 



(2.11) 



where we have introduced AI = ln(l/r). The knowledge 
of the actual value of the average information content I 
yields finally to the following closed expression for the 
fractal dimension: 



D =^4 k ( i+ f 



(2.12) 



That is, we have been able to deduce an exact expres- 
sion for the fractal dimension L>, defined as the Lagrange 
multiplier /3 in eq. (2.3), which involves only the infor- 
mational magnitudes of the iterative process leading to 
the fractal pattern. 



III. TWO-LEVEL ITERATIVE MODEL 

The former model can be extended in order to enclose 
a more general sort of iterative processes. Let us con- 
sider a two- level process V* , in which each iteration order 
k is composed of elements belonging to different classes 
(which we label with an index i) in a number nk(i); that 
is, there are nk{i) elements of class i within order k. 
Each one of these element are exclusively characterized 
by a certain magnitude £k(i)- We define our structural 
probabilities as the probability of a given element be- 
ing of class i and belonging to order k, namely Pk{i) = 
n k{i)/ J2k' i' n k' (*')• Similarly, we define the probabil- 
ity of an element being of class i given that it belongs 
to order k by p(i/k) = n,k(i)/J2i< n k(i'), an d the proba- 
bility on an element belonging to order k irrespective of 
its class by p(k) = TV, n k (i')/ J2k',i> n k'{i')- These dis- 
tributions trivially fulfil the relation Pk(i) = p(k)p(i/k). 
Given our structural probabilities p k {i), V* has assigned 
a total entropy 



(3.1) 



In order to determine the informational constraint, con- 
sider an element of order k and class i, characterized by a 
value £k(i)- If it is composed of £ k (i)/e indistinguishable 
atoms of size e, then we associate to it an information 
content ifc(i) = \n(lk{i) / e). An average over classes i 
provides the average conditional generating information 
of level k 



(3.2) 



A subsequent average over orders yields the global aver- 
age information of V* 
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(/) = </*) = £p(W*« - £>(<) In 



4« 



(3.3) 



Following analogous steps to those in the one-level con- 
struction, the maximization of the entropy (|3.1| ), subject 
to the constraint of average information ( [3. 3| ) constant, 
yields the most likely distribution of occupation numbers 



Pk{i) ~ nk(i) = const, x 



(3.4) 



where (3 is again a Lagrange multiplier. 
We define a new partition function 



k.i 



.(*) 



= exp ( — /31n 



k.i 



tk(i) 



(3.5) 



from which the average information is again given by 
eq. (2.7). The two- level iterative process is now analo- 



gous to a canonical ensemble with partition fuction Z(fi), 
inverse "temperature" /3, and a spectrum of information 
levels l k (i) =ln(* fc (i)/e). 



k = 



k= 1 




FIG 2. First three iterations in the construction of a gen- 
eralized Vicsek set with c = 2, N = 9, and _R = 5, 
D = In 9/ In 5 = 1.36521. 



As an example of this new model, we propose the con- 
struction of a generalized Vicsek set P, L9|, defined as 
follows. The construction of this fractal starts from a 
seed (particle) of mass 1 (fc = 0) and continues in stage 
k by adding to the previous (k — 1) structure 4c iden- 
tical copies of it, evenly distributed along each one of 
the four main branches of the set. Figure 2 depicts the 
generalized Vicsek set corresponding to c = 2. At each 
iterative step the mass (number of particles) of the object 
increases by a factor N = Ac + 1, and its linear length 
by R = 2c + 1, hence the dimension of the limit fractal 
set is D = In N/ In R |l[ . Following the same argument 
as for the original Vicsek set |6| , every iteration order of 
the generalized construction can be decomposed into a 
set of branches characterized by a different mass (num- 
ber of particles) s. As usual Q a branch is defined by 
the unique continuous path that starts at the tip of the 
branch and ends either on the seed or on another brach 
of different mass. It is easy to check that the branches 
at each iteration level can be classified in classes with 
a different mass Sk(i) and an occupation number rik(i) 
(number of branches of mass s k {i) in the k-th iteration) 
given by 



n k (i) = 2{N -l)N l -\ s k {i) = 



R k ~ l - 1 



(3.6) 



for i = l, ■ ■ ■ , fc — l.By removing the i dependence in both 
relations (3.6), we obtain 



n fc (i) - 2{N N 1} N k (2s k (i) + \y D . 

If we identify the magnitude characterizing branches as 
lk(i) — 2sfc(i) + 1, then this expression fits again the 
prediction of MaxEnt . 

By introducing into the average (^^) the actual ex- 
pressions Pk (i) = iv^VEfe' Ei< n 1 '' 1 and 4(i) = R k ~ l 

(taking e = 1 the mass of the initial configuration), and 
extending the sum over orders to k = 2, . . . , m (to avoid 
problems at k = 1 ) and over classes to i = l, — 1, 
we obtain after some algebra 



m k~l 



< 7 > =££p*(*')i* 



k=2 i=l 



M») 



lni? ^ V Yik-i)^- 1 

L~ik=2 Z^i=l JV k=1 i=l 



-InR 



1 



2 e:=x kN- 



k(k + l)N 



— k 



k=l 
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By defining Z* m = J2T=i kN-( k+1 \ we have 



IV. CONCLUSIONS 



where we have introduced the inverse temperature (3 = 
D = lniV/lni?. The partition function is rewritten 



m — i m 

Z* m = ]T kN-^ = J^ik - 1) cxp ( - /3k 



lni? 



k=l 



k=2 



(3.8) 



That is, the process V* behaves now, except for an ir- 
relevant factor i, as a canonical ensemble given by the 
partition function (j3~S|), with "temperature" = l/D 
and a "spectrum" of equally spaced information levels 
Ik = fclni?, but now with a degeneracy f2fe = k— 1. This 
degeneracy corresponds to the effect of the population of 
classes, in a number fi^, inside each iteration level k. 

Again we can express D in terms of purely informa- 
tional magnitudes. Taking the limit m — *■ oo in cq. (3 

1 



Z*(J3) 



lim 

m— *oo 



(3.9) 



and introducing this expression in eq. (3.7), we have 

bxR 



(I) 



1 _ e -/31n_R ' 



(3.10) 



The increment between information levels is now AX = 
In R, so we can write 



(I) 



AJ 



1 - e-P AI ■ 



(3.11) 



The constraint (J) = / leads finally to the following ex- 
pression for D: 



D = j3 



-1 / AZ 

— In 1 

. \ J 



(3.12) 



Eqs. ( 2.1 2| ) and (3.12) can be mapped onto the single 
expression 



ZAl \ I J 



(3.13) 



where £ = +1 for the one-level (non-degenerate) model 
and £ = — 1 for the two- level (degenerate) model. The an- 
alytic expression of D depends of course on the concrete 
details of the model considered. However, both models 
render the same result in the limit |^y^| <C 1, namely 



D ~ ^. 



(3.14) 



In this limit we indeed recover an informational version 
of the well-known equipartition theorem |l§| ] , relating the 
average information / with "temperature" 1/(3= l/D. 



In this paper we have shown how the self-similarity 
of a wide class of deterministic (iteratively constructed) 
fractal sets can be inferred via the Maximum Entropy 
principle. The fit is achieved by imposing a constraint 
concerning the amount of average information required 
to specify the structure of the set (generating informa- 
tion). In this view, self-similarity arises from a vari- 
ational principle, and fractal sy stem s can the refore be 
associated, through equations (2/7) and (3/7), with a 
canonical ensemble from statistical mechanics. In this 
ensemble the role of the temperature (3 is played by the 
fractal dimension. This statistical analogy allows for a 
completely new interpretation of fractal sets and fractal 
dimension in which deterministic fractals are defined as 
systems satisfying a constraint of constant average gener- 
ating information, where this latter refers to the amount 
of knowledge needed to recover the complete system. A 
variational principle (MaxEnt) yields the scaling behav- 
ior characteristic of fractal sets. The fractal dimension 
is defined as the Langrange multiplier introduced in the 
maximization procedure, and it can be analytically com- 
puted if the actual value I of the information is known. 
A particularly attractive point in our proposal is that 
we can define fractal objects and fractal dimension from 
first principles, with no explicit reference to the space- 
filling properties of the sets. This could provide a new 
and promising framework for the study of fractals with 
no geometrical counterpart. 

Our conclusions can be formally extended to enclose 
the more usual random fractals, which in this view are 
systems characterized with a constant average generating 
information. The stability of natural fractal structures 
would then be ensured by the variational principle from 
which they came. Moreover, our proposal is a non-trivial 
preliminary hint of why self-similarity is so frequently 
seen in Nature; that is to say, as a trend resulting from 
an economical way of reaching stability. 
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